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Abstract 

We study the permutation complexity of finite-state stationary 
stochastic processes based on a duality between values and orderings 
between values. First, we establish a duality between the set of all 
words of a fixed length and the set of all permutations of the same 
length. Second, on this basis, we give an elementary alternative proof 
of the equality between the permutation entropy rate and the entropy 
rate for a finite-state stationary stochastic processes first proved in 
[Amigo, J.M., Kennel, M. B., Kocarev, L., 2005. Physica D 210, 77-95]. 
Third, wc show that further information on the relationship between 
the structure of values and the structure of orderings for finite-state 
stationary stochastic processes beyond the entropy rate can be ob- 
tained from the established duality. In particular, we prove that the 
permutation excess entropy is equal to the excess entropy, which is a 
measure of global correlation present in a stationary stochastic process, 
for finite-state stationary ergodic Markov processes. 

Keywords: Permutation entropy; Excess entropy; Duality; Stationary 
stochastic processes; Ergodic Markov processes 
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1 Introduction 



One of the most intriguing recent findings in the science of complexity is 
that much of the information contained in stationary time series can be 
captured by orderings between values [IJ. Bandt and Pompe [2] first in- 
troduced the notion of permutation entropy which quantifies the average 
uncertainty of orderings between values per time unit, in contrast to the 
entropy rate for stationary stochastic processes or the Kolmogorov-Sinai en- 
tropy for dynamical systems, both of which quantify the average uncertainty 
of values per time unit. Bandt et al. [3] proved that the permutation en- 
tropy is equal to the Kolmogorov-Sinai entropy for piecewise monotone maps 
on one-dimensional intervals. Amigo et al. [4] showed that the permuta- 
tion entropy rate is equal to the entropy rate for any finite-state stationary 
stochastic process 0. They also generalized the results of [3] to ergodic maps 
on intervals of arbitrary dimensions by considering the limits of finite-state 
stationary stochastic processes. Keller and Sinn [5] took a different approach 
from that of [3] to generalize the results of [3]. The topological permutation 
entropy was also studied by Bandt et al. [3] , Misiurewicz [6] and Amigo and 
Kennel [7]. 

In this paper, we study the permutation complexity of finite-state sta- 
tionary stochastic processes based on a duality between values and orderings 
between values. Orderings between values induce a coarse-graining of the 
set of all words of a fixed length. Namely, two words are mapped to the 
same ordering (permutation) if order-relationships between values in both 
words are the same. In the case of shift maps on the unit interval, Elizalde 
[8] performed enumerations associated with such a coarse-graining. In our 
case, the enumeration is similar, but much simpler than that of [8J. How- 
ever, we emphasize a dual structure existing between the set of all words of 
a fixed length and the set of all permutations of the same length. Indeed, 
we show that there is a kind of minimal realization map from the latter 
to the former. We can make the pair of the coarse- graining map and the 
minimal realization map form a Galois connection [9\ , which is a categorical 
adjunction [TO] between partially ordered sets, by introducing suitable par- 
tial orders on the sets at both sides. We present an elementary alternative 
proof for the equality between the permutation entropy rate and the entropy 

1 Amigo et al. stated that the equality holds for finite-state stationary ergodic pro- 
cesses in Theorem 2 and an inequality holds for the non-ergodic case in Theorem 6 in [4]. 
However, one can see that they actually proved the equality for any finite-state stationary 
stochastic process if he or she examine their proof carefully. This point is corrected in 
Amigo's recent book pQ. 
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rate based on the duality between values and orderings. 

We can study the further relationship between the structure of values 
and the structure of orderings for finite-state stationary stochastic processes 
beyond the entropy rate equality if we make use of the duality between values 
and orderings in more depth. Here, we consider the excess entropy which is 
a measure of global correlation present in finite-state stationary stochastic 
processes. The excess entropy has an old history in complex systems study 
[U\ [T2l I13j . However, it is still of recent research interest. For example, 
Feldman et al. |14j proposed the entropy-complexity diagrams based on 
the entropy rate and the excess entropy to analyze various types of natural 
information processing. We define the permutation excess entropy and show 
that the permutation excess entropy is equal to the excess entropy for finite- 
state stationary ergodic Markov processes. We also present a simple non- 
ergodic counter-example with a strict inequality. 

Let us give a rough sketch of our proof strategy for the main results. 
Let 4> be the coarse-graining map sending each word of length L(> 1) from 
a finite alphabet to its associated permutation of length L. Given a finite- 
state stationary stochastic process, only permutations tt such that the size 
of (p^ 1 (ir) is greater than 1 may contribute to the difference between the 
entropy rate and the permutation entropy rate of the process. If we denote 
the probability that those permutations occur by qi, then we can show that 
the difference (> 0) before the normalization (division by L) and taking the 
limit of L — > 00 is bounded from above by the probability qi multiplied by 
a function of L whose growth rate is logL by using the fact that the size 
of _1 (7r) is given by a binomial coefficient depending on L for any permu- 
tation tt of length L (Lemma I10j) . The equality between the entropy rate 
and the permutation entropy rate is immediate from this bound (Theorem 
[IT]) . Furthermore, if the process is ergodic Markov, then we can show that 
qi, diminishes exponentially fast as L — > 00 by using a characterization of 
words such that <^> — 1 (7r) = {sf } for some tt and the irreducibility of the 
associated transition matrix. This leads to the equality between the excess 
entropy and the permutation excess entropy (Theorem I14p . We note that 
those words sf such that 4>~ 1 (7r) = {sf} for some tt can be seen as a special 
type of "stable objects" under the duality between the coarse-graining map 
4> and the minimal realization map (Theorem [9] (hi)). 

This paper is organized as follows. In Section 2, we establish the duality 
between values and orderings. In Section 3, we give a proof of the equality 
between the permutation entropy rate and the entropy rate for finite-state 
stationary stochastic processes based on the duality. In Section 4, we prove 
the equality between the permutation excess entropy and the excess entropy 
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for finite-state stationary ergodic Markov processes and give a non-ergodic 
counter-example with a strict inequality. 

2 Duality between Values and Orderings 

In this section we establish the duality between values and orderings. 
2.1 Permutations and Rank Sequences 

Let A be an alphabet. We consider the case that the cardinality \A\ of 
A is finite or countably infinite. If \A\ = n (n = 1, 2, • • • ), then we write 
A = A n = {1, 2, • • • , n}. If n = oo, then A = A^ is identified with the set of 
all natural numbers N = {1, 2, 3, • • • }. We consider A n (n = 1, 2, ■ ■ ■ , oo) is 
not just a set, but a totally ordered set ordered by the 'less-than-or-equal-to' 
relationship < between natural numbers. In the following discussion, if we 
write just A, then A can be either A n or A^ = N. 

Let A L = A x ■ ■ ■ x A for L > 1. Each element w E A L is called a word 

L 

of length L. If w = (si, • • • , sl) E then we write w = si • • • sl = sf- 

Let Sl be the set of all permutations of length L, namely, <Sl is the set of 
all bijections on the set {1, 2, • • • , L}. For E A L and tt E <Sl, we say that 
sf is of type tt if we have s^uj < s 7r (j +1 ) and 7r(i) < ir(i + 1) when s n u\ = 
s w (»+i) for i = 1,2, • • • ,L - 1. For example, 7r(l)7r(2)7r(3)7r(4)7r(5) = 24315 
for sf = 31213 because S2S4S3S1S5 = 11233. 

Each word sf E ^4 has a unique permutation type 7r E <Sl. Hence, the 
correspondence 1— )• 7r defines a many-to-one (in general) map 
Sl, which coarse-grains the set A L of words of length L by their permutation 
types. 

We make use of the notion of rank sequence introduced in p£j. In some 
situations, discussions might become facilitated if we use rank sequences 
instead of permutations. However, as far as the authors are aware, their 
compatibility with the map (f> sending words to associated permutations has 
not been presented explicitly so far. Hence, it may not be worthless to study 
them here. 

A word r\ E N L is called a rank sequence of length L if it satisfies 
1 < Ti < i for i = 1, • • • ,L. We denote the set of all rank sequences of 
length L by TZl- Note that there exists a bijection between Sl and TZl 
because \Sl\ = LI = \1Zl\- 

Each word sf E A L gives rise to a rank sequence rf 1 E TZl hi the following 
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way: 

i 

where <5(X) = 1 if the proposition X is true, otherwise S(X) = 0. Namely, r« 
is the number of indices j (1 < j < i) such that Sj < Sj. This correspondence 
i-> rf defines a map </? : 

In the following discussion, we will show that there is a bijection t : 
TZl — > $l such that i o = namely, the following diagram commutes: 




Given a rank sequence rf G TZl, we define a permutation t(rf) = 7r € 5 £ 
inductively as follows: first, we define 7r(l) = max{i|rj = 1, 1 < i < L}. 
7r(l) is well-defined because we have r\ = 1. Second, we define 

7r(2) = max{i|r^ = min {r^}} 

where • • • is a rank sequence defined by 

r (i) = |n-l if * > ?r(l) 

1 rj otherwise. 

In general, we define 

7r(/c) = max{i|r^ fc ^ = min{rj fc ^ 7r(l),--- , 7r(/c — 1)}} 

for k = 2, • • • , L, where r^ -1 ^ • • • r^ 1 ^ is a rank sequence defined by 



(*-i) = J r f 2) " 1 if * > < k ~ *) and * + • • • ,tt(A; - 2) 
) r- fc 2 ^ otherwise, 



r. 



and r^ - 1 = rj. By construction, this procedure defines a unique permutation 



i (rf) = vrG5 L . 
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For example, consider rf = 11342 € TZs. tt = t(11342) € S§ is obtained 
by the following calculations: 



TT(l) 


= max{i 


\n = 


1} = 2, rj 1)5 = 11231, 




7T(2) 


= max{i 


iJi) 

i 


■ r (l)n r (2)5 

= muH r) K = 5, r) = 


11231, 


tt(3) 


= max{i 


|_(2) 


. r (2)-.-. -, (3)5 

= mmjr^ )) = 1, r\ = 


= 11121, 


tt(4) 


= max{i 


|_(3) 


■ r (3) -ii o (4)5 
= mm \r- Y\ = 6, r) 

3^1,2,5 J 1 


= 11111 


tt(5) 


= max{f 


1 i 


= min {r (4) }}=4. 

#1,2,3,5 J 





Lemma 1 

fork = l,2,--- ,L. 

Proof. It is sufficient to show that rj fc ^ = 1 for some j 7^ vr(l), • • • , Tr(k — 2). 
Consider the minimum index j such that j {^(1), • • • , 7r(fc — 2)}. Then, 

we have r^ k l ' = rj — (j — 1) because {1, • • • , j — 1} C {-7r(l), • • • , 7r(A; — 2)}. 

(k— 1) 

However, 1 < < j and r- > 1 by construction. Hence, 7j = j and we 
obtain rj fc 1 ' = 1. 

□ 



Proposition 2 The map 1 : TZl — > Sl is a bijection. 

Proof. It is sufficient to show that i is injective because \TZl\ = = L\ < 
00. Assume that i(ry) = t(rf') = 7r for r\,r\ £ 7£l, 7r E <Sz,. We have 
^ = fj L ^ = 1 for z = 1, • • • , L by Lemma [1] because = r ^[J^ 

for /c = 1, ••• ,L. We can reconstruct both r\ and ff 1 from r\ L := 
r (L i)L _ ~(L i)L _ _ ^ ^ ^ e following procedure: first, we add 1 to 

L 

the vr(L)-th 1 in ^ if vr(L) > 7r(L — 1), and do nothing otherwise. The 
obtained sequence r[ L 2 ^ L is identical to both r[ L 2 ^ L and r[ L 2 ^ L because 
t(rf') = t(ff') = 7r. Second, we add 1 to r^,^ if vr(L) > 7r(L — 2), and 

do nothing otherwise, and add 1 to r^^j^ if tt{L — 1) > 7r(L — 2), and do 
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nothing otherwise. If we call the obtained sequence r[ L 3 ^ L , then we have 

_(L-3)L (L-3)L ~(L-3)L T , .„ , c 

r\ = r\ = r\ .In general, it we define 

_ (L _ fc ) = frf " (fe_1)) + 1 if i € {vr(L - (fe - 1)), • • • , n(L)} and i > n(L - k) 
] ^ 1 ^ otherwise 



for k = 2, ■ ■ ■ , L, then we have ^ = r[ L k ^ L = f[ L k ^ L . In particular, 

□ 



we obtain r^ L = rf = f\ for /c = L 



Proposition 3 totp = cj). 

Proof. We have to show that <.(y(sf')) = (j){ s i ) f° r an Y word € A 1 '. Put 
7T = ^(sf), 7f = i(ip(si )) and rf = We shall show that 7r(/c) = 7r(fc) 

for A; = 1, ■ ■ ■ , L inductively. First, we show that 7r(l) = vr(l). By the 
definition of (p and t, it (I) is the index i of the minimum- leftmost Sj and 
7f(l) is the maximum index i such that rj = 1. We have 

Ti = 1 <^ Sj > s i for j = 1 , • • • , i — 1 

by the definition of rank sequences. Hence, > s^i) for j = 1, • • ■ , 7r (1) — 1. 
On the other hand, we have s^m < Sj for j = 7r(l), • • • , L. Indeed, if there 
exists j > tt(1) such that s^m > Sj, then rj > 1 must hold because 7r(l) is 
the maximum index i such that T{ = 1. Hence, there exists ji < j such that 
sj 1 < Sj. If ji < 7r(l), then this contradicts Sfc > s^-(i) for k = 1, • • ■ — 
So, we have 7r(l) < ji < j. Since s^m > > Sj i; the same argument can 
be applied to ji instead of j. Thus, we obtain a strictly decreasing infinite 
sequence of indices j\32 • • ■ such that 7r(l) < • • • < j2 < ji < j- However, 
this is impossible because the number of indices between 7r(l) and j is finite. 
Therefore, s„(i) is the minimum- leftmost value in sf , which implies that 

7f(l)=7T(l). 

Now, suppose that 7r(l) = vr(l), • • • ,Tr(k) = Tr(k), where 1 < k < L — 1. 
We would like to show that 7r(k + 1) = ir(k + 1). By the definition of 
(j) and l, Tr(k + 1) is the index i of the minimum-leftmost except for 
s^m, • • • , s^fc) and 7r(A;+l) is the maximum index i such that = 1 except 
for 7r(l), • • • ,7f(/c), where we have 7r(l) = 7r(l),-- - ,7r(£:) = 7r(/c) by the 
assumption of the mathematical induction. For an appropriate permutation 
(ii, ■ ■ ■ , ifc) of (1, • • • , k), we have 

7r(n) < • • • < 7r(i m ) < w(k + 1) < 7r(i m+ i) < • • • < 7r(i k ). 
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It must hold that 7>( fc+1 ) = m + 1 because = 1- The number of 

indices j for j = 1, • • • , 7f(/c + 1) — 1 such that Sj < s^^+i) is m by the 
definition of r^^+i)- On the other hand, we have s^^, • • • , s^^) < s^^+i) 
by the definition of n. Hence, the equality 

{j\sj < »if(Jfc+i) and 1 < i < ^(A; + 1)} = {vr(n), • • • ,ir(i m )} 

holds. Thus, if j / ^(ii), ■ ■ ■ ,7r(i m ) and 1 < j < 7r(A; + 1), then we have 
sj > s#(fc+i). This implies that 7r(/c + 1) < 7r(/c + 1) because if 7r(A; + 
1) < n(k + 1), then s^+i) > ^(fc+i), which contradicts the assumption 
that s^+i) takes the minimum value except for s 7r (i),--- ,s n ^y For the 
other inequality, assume that 

""(^m') ^ vr(/c + 1) < 7r(i m '+i). We have 
s i > s 7r(fc+i) f°r J 7^ 7r(«i),-"" , 7r(i m /) because s vr (fc + i) takes the minimum- 
leftmost value except for s^-n),-- - jS^fk)- On the other hand, it follows 
that s 7r (j 1 ),--- , S7r(j ,) < s^-^+i) by the definition of ir. Hence, we have 

»V(fc+i) = Ej=i +1) s ( s j < %(fc+i)) = m' + 1, which implies that r^ +1) = 1. 
Thus, we obtain ir(k + 1) < Ti(k + 1) because 7r(/c + 1) is the maximum index 
i such that rf^ = 1 except for 7r(l), • • • , 7r(/c). 

□ 

Corollary 4 For s^tf G the following statements are equivalent: 
(i) <p{s L l ) = <t>{t L l ). 

(ii) For all 1 < j < k < L, Sk < Sj 44> tk < tj. 

Proof. Assume that 4>(sf) = <^>(tf ) = ir G <Sl- Then, we have 
S7r(i) < Stt(2) < • • • < Stt(L) and 

Hence, (ii) holds. For the reverse direction, assume that (ii) holds. Then, we 
have Yjk=i $( s k < s j) = Sfc=i ^(^fc ^ f° r an y 1 < J < which implies 
) = )■ Hence, we have <j)(s[ ) = i o <£>(sf ) = t o ^(tf-) = <j)(t[). 

□ 
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2.2 The Coarse-Graining Map <f> 

Now, we are ready to study properties of the coarse-graining map <f> : A L — > 
Sl in detail. 

Lemma 5 Let 7r G Sl- Assume that there is no sf G A\_ x such that 
4>{s\) = it, but there exists sf G suc/i that <f>(s\) = tt for some i > 1 
(when i = 1 we define = ^40 = 0^- 

^ There exists a unique s\ G swc/i i/iai = 7r. Moreover, if 

4>{ti) = 7r /or G -A^ and n > i, then there exist c±, ■ ■ ■ ,cl such that 
Sk + c k = tk for k = 1, ■ ■ ■ , L and < c^) < • • • < c^ L ) <n-i. 

(ii) |^> _1 (-7r)| = ( L ^~ l ) , where n>i and the domain of <fi is set to A%. 



Proof, (i): First, we prove the uniqueness. If i = 1, then we have nothing 
to do. So, we assume that i > 2. Suppose that </>(sf) = <K^i) = t and 
sf^tf G Af. If sf ^ if, then there exists j such that s n ^ / ^(j)- We can 
assume that s n ^ < t n ^ without loss of generality. Let us define a word u\ 
by 



u 



7r(fc) 



Sn(k) k = !,-■■ , j - 

t*(k) - 1 k = j,---,L. 



We claim that 4>( u i) = t 1 "- Indeed, it is clear that we have u^^-i) < n 7r(fc) 
and 7r(/c — 1) < ir(k) when w^-i) = u^m, for A; 7^ j. When = j, we 
have s^q^i) < < t w ^^ — 1 by the assumption. Suppose that s n Q_i^ = 
t n Q) — 1. It follows that s^-(j_i) = SttQ), which implies that n(j — 1) < vr(j). 
Thus, we have (j)(ui) = it. However, this contradicts the assumption that 
there is no s^ G ^\-\ such that 4>(s± ) = 7r because G A\_ v 

Next, suppose that </>(tf ) = 7r for G n > i. Let us show that 
s n (k) < ^7r(fc) for /c = 1, • • • , L. If i = 1, then we have nothing to do because 
s -w(k) = 1 for an k. So, we assume that i > 2. If there exists j such that 
s^q) > t^Q-) , then a word u\ defined by 



u 



tn(k) k = !,-■■ 

Sn(k) - 1 k=j,---,L. 



i±) = tt and u\ G ^4f_! by the same reason in the proof of the 
uniqueness, which violates the assumption that there is no s\ G Af_± such 
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that 4>(si) = it. Hence, if we define Ck = tj. — Sk for k = 1, • • • ,L, then 
Cfc > and c^cl) < n — i because t^n < n and s^cl.) = i. The remaining 
task for us is to show that c n ^-j < c^tk+i) for k = 1, ■ ■ ■ , L — 1. If i = 1, 
then c^) = ^(fc) - 1 < ^(fc+i) - 1 = ^(fc+i) for k = 1, • • • ,L — 1. Suppose 
that i > 2 and c^q^ < c^u\ for some j. Then, we have 

c 7r(j + l) < c tt(j) & ^7r(jr' + l) — s 7r(j + l) < ^ir(j) ~ S n(j) 

° s f(j) + (^(i+l) ~ *7T(i)) < S 7r(j + 1)- 

This implies that 

Sn(j) < + (*7r(i+l) - *7r(j)) < S 7r(j+1) ~ 1 (1) 

because ^(j+i) > ^(j)- Let us introduce a word u\ by 




We claim that (f>{u^) = vr and u\ G ^i-n which contradicts the assumption 
that there is no s\ € Af_^ such that </>(sf ) = 7r. We only need to show that 
7r(i) < 7r(i + l) when u n Q) = n 7r(i+1) . However, by (P), if s^) = s^j+i) -1, 
then we have t n ^ = t n ^ + i^, which implies that ir(j) < ir(j + 1). 

(ii): The number of sequences c\ ■ ■ ■ cl satisfying < c^m < ■■■ < 
c n (L) < n — i is given by a binomial coefficient ( L ^-J 1 )- Hence, we have 
1 1 ( vr) [ < ( L ^™p) ^ (*)• On the other hand, given a sequence c\ ■ ■ ■ cl 
such that < c^ru < c-k(2) < ' ' ' < c tt(l) < n — i, if G ^ defined by 
tk = s fc + c fc f° r A; = 1, • • • ,L clearly satisfies = vr. Hence, we have 

i*- l wi>( £ £r')- 

□ 



If there is no word € -AfLj such that </>(sf ) = 7r, but there exists a 
(unique) word G ^ such that 4>{ s i) = vr for i > 1, then we say that 
7T appears for the first time at i. We denote the number of permutations 
7T € Sl that appear for the first time at i by is(i,L). By Lemma O we have 

L) = 1 and 



z/(n, L) = 




for n > 2. 
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The following Proposition [6] and the subsequent paragraph in this sub- 
section are only for the record. They will not be used in later sections. So, 
readers who are interested in only the main results of this paper can skip 
them. 

Proposition 6 A closed-form expression for v(n, L) is given by the follow- 
ing formula: 

v(n,L)=J2(-l)H ] Vn-i) L . (3) 



Proof. We prove the formula by mathematical induction on n. if n = 1, then 
we have u(l,L) = 1. Assume that the formula holds for natural numbers 
1, 2, • • • , n. Then, we have 



v(n + l,L) = (n + l) L ~Y^ 



i=l 



L+n+l-i 
n + 1 — i 



v(i,L) 



i-l 



c-u'-E { L :T-7 D-d 

i=l v 7 fc=0 



k(L + l 



(i - k) 1 



( 



3=1 



Ec-rfr^rXT) 



i i-k=j, 
\l<i<n 



It is enough to show that 



k+l fL + n + 1 — i\fL+l 



i—k=j, 
Ki<n 



n + 1 — i 



k 



(-1) 



n+l-j 



L + l 
n + 1 — j 



for j = 1, • • • , n. If we put I = n — j, then this is equivalent to showing that 
l 



k=0 



L + 1 + I - k\ L + 1 



A- 



L + l 
i + 1 



for I = 0, 1, • • • , n — 1. Consider the equality 

(l + x)- (L+1) (l + x) L+1 = 1 



(4) 



(5) 
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which holds for \x\ < 1. The left-hand side of ([5]) can be written as 

;&-<")') ©"■> 

If we compare the coefficient of x l+1 for I = 0, 1, • • • in both sides of the 
equality ©, then we obtain 



p+g=/+l 

After a few algebras, we can derive the desired equality (jH). 



□ 



Note that (|3j) is identical to a closed-form expression for the Eulerian 
number ^ n ^]\ [IS], where the Eulerian number (^) is the number of per- 
mutations 7r of {1, • • • , a} that have exactly b ascents, namely, b places with 
7r(j) < tt(J + 1). The equality ([2]) is equivalent to the so-called Worpitzky's 
identity: 

Indeed, one can obtain the Worpitzky's identity ([6]) from (|2J) by a few alge- 
bras using the symmetry law / ^ \ = / L L , 

2.3 The Minimal Realization Map 

For any 7r € 5l, we can construct a word 6 N L such that = 7r 

in the following procedure: first, we decompose the sequence 7r(l) • • • 7r(L) 
into maximal ascending sequences. A subsequence ij ■ ■ ■ ij+k of a sequence 
ii ■ ■ ■ %l is called a maximal ascending sequence if it is ascending, namely, 
ij < < ••• < and neither ij-\ij ■ ■ ■ ij+k nor *j • • • ij+kij+k+i is 

ascending. Suppose 7r(l) ••• vr(«i), 7r(ii + l) • • • tt^), • • • , 7r(ife_x + 1) • • • it{L) 
is a decomposition of 7r(l) • • • 7r(L) into maximal ascending sequences. If we 
define a word € N L by 

s 7r(l) = • • • = S^fi^ = 1, S^^^^x) = • • • = S n (i 2 } = 2, 
' ' ' ) S 7r(i fc _i) + 1 = • • • = ^(L) = k, 
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then we have <f>( s i) = tt by construction. Thus, it appears for the first time 
at most k. We denote the word sf by /i(7r). [i defines a map fi : Sl — > N L 
such that o /i(7r) = 7r for any tt G Sl- 

For example, if ir G 5s is given by 7r(l)7r(2)7r(3)7r(4)7r(5) = 24315, then 
its decomposition into maximal ascending sequences is 24, 3, 15. If we put 
S2S4S3S1S5 = 11233, then we obtain /i(vr) = S1S2S3S4S5 = 31213. 

Let 7r G Sl appear for the first time at n. By Lemma there exists a 
unique word s\ G A^ such that (/'(sf') = 7r. We say that sf is a minimal 
realization of 7r. In the following, we shall show that h(tt) is a minimal 
realization of 7r. 

Proposition 7 T/ie following statements are equivalent: 

(i) Si G is a minimal realization of some permutation ir G Sl that 
appears for the first time at n. 

(ii) For any 1 < i < n — 1, there exists 1 < j < k < L such that Sj = 
i + 1, s k = i. 



Proof. When n = 1, the equivalence is trivial. So, we assume that n > 2 in 
the following discussion. 

(i)=>(ii): Let G A^ be a minimal realization of 7r G Sl that appears 
for the first time at n. Suppose that statement (ii) does not hold. Then, 
there exists 1 < i < n — 1 such that, for any 1 < j, k < n, if Sf- = i and 
Sj = i + then k < j. Let us define a word t\ by 




Sj — 1 if Sj = i + 1, 
Sj otherwise. 



We claim that </>(tf ) = vr. By Corollary HI it is sufficient to show that 
Sk ^ s j ^ tk ^ £j for all 1 < k < j < L. Fix 1 < k < j < L. Assume that 
Sk < Sj. If Sj = i + 1, then we have ij = Sj — 1 = i. If we also have Sk = i + 1, 
then tk = Sk — 1 = i = tj. Otherwise, we have S}.^ i + 1. Thus, we obtain 
Sk < i because s& < Sj = i + 1. Then, tk = Sk < i = tj. On the other hand, 
if Sj 7^ i + 1, then we have tj = Sj. Thus, we obtain t^ < Sk < Sj = tj. To 
show the reverse direction, let us assume that tk < tj. If Sj = i + 1, then 
tj = Sj — 1 = i. If we also have Sk = i + 1, then Sk = Sj. Otherwise, we 
have Sk ^ i + 1, then t^ = Sk so that Sk = t^ < tj = i < Sj. On the other 
hand, if Sj ^ i + 1, then we have tj = Sj. If we also have 7^ i + 1, then 
Sk = tk < tj = Sj. Otherwise, we have Sk = i + 1, then tk = Sk — 1- Suppose 
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that Sk > Sj. Then, Sj < Sfc = i + 1 and i = Sk — 1 = tk < tj = Sj. Hence, 
Sj = i. Since we have assumed that (ii) does not hold, we obtain j < k. 
However, this contradicts our other assumption that k < j. Hence, we have 

S k < Sj. 

Suppose that there exists j such that Sj = i + l. Then, we have t\ ^ s\. 
This contradicts the uniqueness of minimal realization of ir because both 

and t\ are contained in A%. Suppose that there exists no j such that 
sj = i + l- Since tt appears for the first time at n and is its minimal 
realization, we have = n. Hence, i + 1 < n should hold. Let us take 

the least j such that i + 1 < s n rj\ and put it as j'q. If we define a word tf by 



then we have </>(if) = n. Indeed, *7r0' — l) = s t(jo-i) < ^ + 1 < s 7r(j ) — = 
t^u Q \ because i + l < s^y y On the other hand, we have t\ G A^_±, which 
is a contradiction. 

(ii)=>-(i): Assume that sf € A^ satisfies (ii). Let tf € A^_ i be a minimal 
realization of tt = </>(sf). We shall show that i = 0. By Lemma we have 
*7r(k) < s 7r(fe) for k = 1, • • • , L and < cv(i) < • • • < = n - (n - i) = i 

for Cfc = Sk — t^. Suppose there exists j such that 1 < c^u). Take the 
least j such that 1 < c^q) and put it jo- Now, consider the least k such 
that = s^q/o) and the largest k' such that = s 7r(j ) anc ^ P u ^ 

them as ko and fei, respectively. Then, we have t 7r (fc ) = t n ( kl y Indeed, 

t TT{k ) = S TT(k ) ~ C TT(k ) = STT(fcl) ~ C n(k ) - ~ ^(fcl) = ^(fcl) ■ O 11 tn6 

other hand, Ofo) < t^n.^ because ko < fci- Thus, we obtain i^o,,,) = ^n^)- 
This means that c^a^-j = c,^), which, in turn, implies c^ha = c^(j \ for all 
ko <k <k\. (Thus, jo = ko- ) If we define a word u\ by 



%(k) 




1 if ko < k < ki, 
otherwise, 



then we have (j)(uf ) = tt. To show this, we should care for only k = ko — 1, ko 
and k = k\,ki + 1. First, let us consider the former. By the definition of 
u\ and ko, we have u^-i) = ^(fto-l) = *7r(fco-l)- We also have = 
s Tr(k ) - 1 > t%(ko) because s^ko) ~ *7r(fc ) = c Tr(k ) > !• Hence, Uxfa-i) = 

*7r(fc -l) - *7r(feo) - U 7r(ifco)- If U 7r(ifco-l) = U 7r(fc )' tnen f ir(fe -l) = *7r(fc )' wnich 

implies that n(ko — 1) < 7r(&o). The latter is obvious because u^f^i) = 

s n(ki) - 1 < s 7r(fcx+l) = n 7r(fci+l)- 
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Now, if we put s,r(j ) = a(> 2), then there exist j\ < ]2 such that Sj 1 = a 
and Sj 2 = a — 1 by (ii). By the construction of uf, we have Uj 1 = a — 1 = 
Uj 2 . This implies that u\ and sf have different rank sequences because 
ip(uf)j 2 > <p(si)j 2 . Thus, we have 4>(u±) = t o </3(itf) / to </?(sf) = 
which is a contradiction. 

□ 

Corollary 8 For ir € 5^, /u(tt) is a minimal realization of ir. 
Proof. Let 

7r(!) • • • 7r(ii), ttO'i + 1) ' "TrC^), ■ ■ ■ ,n(jk-i + !•)••■ tt(^) 

be a decomposition of 7r(l) • • • vr(L) into maximal ascending sequences. If 
sf 1 = ^(vr), then 

s 7r(l) = • • • = = 1, 8,^+1) = • • • = = 2, 

' " ' S i-(jfc-i)+l = • • • = %(£,) = 

by the definition of fi. For each 1 < i < k — 1, we have s^^) = i, s 7rr/i+i) = 
i + 1 and 7r(jj) > ir(ji + 1). Hence, condition (ii) of Proposition [7] is satisfied 
by sf. Since <fi(si) = it, is a minimal realization of it. 

□ 

2.4 The Duality 

We can make the pair of maps 

N L ~ ^ St. 

form a Galois connection [9] in the following way: we consider the set Sl as 
an ordered set with the discrete order, namely, we define an order relation 
<s L on Sl by it <$ L it' :44> it = it' . On the other hand, we introduce an 
order relation < n l on N L by < n l tf <P( s i) = 4>{^i) =: 71 an d there 
exist < c w (i) < ■ ■ ■ < c w (l) such that s& = c& + tk- By Corollary El we have 

4>{s{) <s L vr & sf < n l /x(tt) 
for sf € N L and 7r € Sl. 
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If we restrict the domain of the map <j> to we obtain the following 
form of the duality stated in Theorem [9] (iv) bellow. Theorem [9] summarizes 
the main results of this section. 

Theorem 9 Let us set the domain of the coarse- graining map <p to A^. 

(i) For tt € Sl, z/</> -1 (-7r) ^ 0, then the value of\(p~ 1 (Tr)\ takes a binomial 
coefficient for some 1 < i < n. 

(ii) For it £ Sl, the following two statements are equivalent: 

(a) irV)! = 1. 

(b) tt appears for the first time at n. 

(Hi) For Si € A^, the following three statements are equivalent: 

(c) _1 (7r) = {si } for some tt € Sl- 

(d) For any 1 < i < n — 1 there exists 1 < j < k < L such that 
Sj = i + 1, Sfc = i. 

(e) s{ A^_ x and s\ = \i o (f)(s[). 

(iv) If we restrict 4> on the subset of A^ consisting of words satisfying 
one of the three equivalent conditions in (in), then 4> gives a one-to- 
one correspondence between these words and permutations of length L 
satisfying one of the two equivalent conditions in (ii) with its inverse 
p. 



Proof, (i) If tt appears for the first time at i < n, then |</> _1 (7r)| = ( 
by Lemma [5] (ii). 

(ii) (a)=>(b): Suppose |</>~ 1 (7r)| = 1 and tt appears for the first time at 
i < n. By (i), = 1 holds. This happens if and only if i = n. 

(b)=^(a): If tt appears for the first time at n, then there exists a unique 
s[ £ A% such that <j)(s%) = tt. Hence, ^(tt) = {sf }. 

(in) (c)=>(d),(e): Assume 4>~ l ( 7r ) = {s\} for some tt € Sl- By (ii), tt 
appears for the first time at n. Hence, is a minimal realization of tt. 
Hence, (d) holds for by Proposition UJ To see (e) holds, first observe 
that Si cannot be contained in A^_ x . We also have s\ = fj,(n) = fj,((f>(sf)) 
because h(tt) is a minimal realization of tt by Corollary [8l 

(d)=>(c): If (d) holds for sf, then sf is a minimal realization of some 
tt € Sl that appears for the first time at n by Proposition [JJ Hence, we 
have (/> -1 (7r) = {sf} by the uniqueness of minimal realization. 
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(e)=>(c): Assume -^n-i anc ^ s l = mOK^i))- s i ^ s a minimal real- 
ization of 4>(si) by Corollary [8) appears for the first time at n since 
si A%_ v Hence, ^"^(sf)) = {sf } holds by (ii). 

(iv) Let us put X = {s{ G A^\s[ G" A%_ v = fj, o </>(sf)} and 
y = {vr G SlII^ 1 ^)! = 1}. If sf G X, then 0- 1 (^(sf)) = {sf}. Hence, </3 
restricted on X is a map from X into y. On the other hand, fi restricted 
on y is a map from Y into X. Indeed, 7r appears for the first time at n by 
(ii). Since /u(7r) is a minimal realization of tt by Corollary [8l it must hold 
that _1 (7r) = {/u(7r)}. Thus, we have /x(7r) G" A^_ x and /x(7r) We 
also have //(vr) = fi o o /x(7r) because (/> o // is an identity on Sl- Now, fx 
restricted on Y is a left inverse of (/> restricted on X by the definition of X. 
It is also a right inverse because o is an identity on <Sl. □ 



3 Permutation Entropy Rate Revisited 

Let S = {Si, S2, • • ■ } be a finite-state stationary stochastic process, where 
stochastic variables Si take their values in A n . Stationarity means that 

Pr{5i = s%,--- ,Sl = sl} = Pr{Sfc+i = si,-- - , S k+L = s L } 

for any k, L > 1 and si,-- - ,sl G A n . For simplicity, we write p(s^) — 
p(si • • • sl) instead of Pr{5i = si, ■ ■ ■ ,Sl = sl}- In the following discussion, 
we set the domain of the map <fi introduced in Section 2 to . 

The entropy rate h(S) of a finite-state stationary stochastic process S = 
{Si, S2, ■ ■ ■ } is defined by 

h(8) = lim jH(S^), (7) 

where H(S^) = H(Si,--- ,Sl) = - Y, s ^eA L p( s i ) Here, we take 

the base of the logarithm as 2. It is well-known that the limit exists for any 
finite-state stationary stochastic process [T6] . 

The permutation entropy rate h*(S) of a finite-state stationary stochastic 
process S = {Si, S%, ■ ■ ■ } is defined by 

h*(S) = lim jH*(S^), (8) 

where H*(Sf) = H*(Si, ■ ■ ■ , S L ) = -J2 7 TeS L P( 7T ) lo &P( 7T ) and P( n ) is tne 
probability that 7r is realized in S, namely, p(ir) = Y1 S L e^- 1 (-7r) f° r 
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7r € Sl- Amigo et al. proved that the limit exists for all finite-state sta- 
tionary stochastic processes and is equal to h(S) [US]. They first showed 
the equality with the assumption of the ergodicity. Then, they proceeded to 
the general case by appealing to the ergodic decomposition theorem of the 
entropy rate. 

If we make use of rank variables Ri = ^j=i ^ ($j — &i) f° r ^ = 1, 2, ■ ■ ■ 
introduced in [3], then the permutation entropy rate has the following alter- 
native expression by Proposition [2] and Proposition [3) 



Intuitively, the entropy rate quantifies the uncertainty of values per unit 
symbol on the one hand, while the permutation entropy rate quantifies the 
uncertainty of orderings between values per unit symbol on the other hand. 

In the following discussion, we give an elementary alternative proof of 
h(S) = h*(S) for a finite-state stationary stochastic process S = {Si, 5*2, ■ • • } 
based on the duality between values and orderings established in Section 2. 

Lemma 10 



h*(S) = lira T H(R{). 




( 



\ 



0) 



Wl>i 



/ 



Proof. 




p(n)>0 




p(n) p(ir) 
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Now, we have 

0<- E ^log^<nlog(L + n) 

for 7r € 5l such that c/> -1 (7r) 7^ and p(ir) > because the value of |^ _1 (7r)[ 
takes a binomial coefficient for some 1 < i < n by Theorem [9] (i) . 

Note that if i = n, then |</> _1 (7r)| = 1, which implies 

P(*l),_P(*l) 



E ^log^ = 0. 

p(7r) p(7r) 



□ 



Theorem 11 For any finite-state stationary stochastic process S = {Si, S2, • • • }, 
h(S) = h*(S). 

Proof. Since we have 

p(7r) < 1 and - -> 0, 



we obtain 



Z/ L— >oo 



fc*(S) = ton = lim = ^(S) 



by Lemma PTOl 



□ 



4 Permutation Excess Entropy 

The excess entropy |17j E(S) of a finite-state stationary stochastic process 

5 = {S\, 5*2, • • • } is defined by 

E(S) = lim (H(Sh - h(S)L) , (10) 

L— too 
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if the limit on the right-hand side exists. The excess entropy E(S) is a 
measure of global correlation present in a finite-state stationary stochastic 
process S = {Si, S2, • • • }• If E(S) exists, then we can write [TT] 



oc 



L=l 



E(S) = £ (/r^lSf- 1 ) - MS)) = Hm J(St,S 2 L L +1 ), (11) 



L— >oo 



where i_ is the conditional entropy of Y given X and I(X; Y) is the 

mutual information between X and Y for stochastic variables X and 1". 

The permutation excess entropy E*(S) of a finite-state stationary stochas- 
tic process S = {Si, S2, ■ ■ ■ } is defined by 

E*(S) = lim (H*(Sh - h*(S)L) , (12) 

L— ►oo 

if the limit on the right-hand side exists. 

It is straightforward to obtain a similar alternative expression for the 
permutation excess entropy E*(S) to that for the excess entropy ([TT]) . when 
E*(S) exists: 

00 

E*(S) = J2 (HiRLlRt 1 ) ~ h*(S)) . (13) 

L=l 

Note that we also have the equality h*(S) = lim^oo H{Rl\R^ 1 ) which 
is an analog to the alternative expression for the entropy rate h(S) = 
lmik._j.oo H(Sl\Si _1 ) because the right-hand side expression in (fT3j) con- 
verges. We can prove that the permutation excess entropy E*(S) also ad- 
mits a mutual information expression if the process S is ergodic Markov, 
which will be presented elsewhere |18j . 

We would like to know whether E(S) = E*(S) holds or not for a given 
finite-state stationary stochastic process S. In the rest of the paper, we 
give a partial answer to this problem. In particular, we will show that 
E(S) = E*(S) for any finite-state stationary ergodic Markov process. 

Note that we always have E*(S) < E(S) if the limits on both sides exist 
because H*(Sf) < H(S^) and h*(S) = h(S) by Lemma [TO] and Theorem 
[TT] respectively. To show E*(S) = E(S), it is sufficient to show that 



\|0- 1 WI>1 



logL -> 

L— >oo 
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if E(S) exists. Let us put 



\4>-Hn)\>l 



Lemma 12 Let e be a positive real number and L be a positive integer. 
Assume that for any s £ A n , 

Pr{si L/2j \sj + s for any l<j< [L/2\} < e 

holds, where [x\ is the largest integer not greater than x. Then, we have 
Ql < Ine. 

Proof. We shall prove 



^2 p(7r) > 1 - 2ne. 

ir'wi=i 

Let us consider a word G satisfying the following two conditions: 

(i) Each symbol s G A n appears in s[ L ^ 2 ^ at least once. 

(ii) Each symbol s G A n appears in sj" L / 2 J+i a * ^ eas * once - 
By the assumption of the lemma, we have 



Pr{s[ L/2j |(i) holds} > 1 -ne, 



because 



Pr{s[ L/21 |(i) holds} + ^Pr{si L/2j \ Sj ± s for any 1 < j < [L/2\} > 1. 

s=l 

Similarly 



Pr { s tz./2j+il( ii ) holds} > 1 - ne 
holds because of the stationarity. Hence, we have both 

Pr{sf |(i) holds} > 1 - ne and Pr{sf |(ii) holds} > 1 - ne, 
which imply 

Pr{sf [both (i) and (ii) hold} > 1 - 2ne. 
21 



It is clear that a word s\ € satisfying both (i) and (ii) fulfills condition 
(d) in Theorem [9] (iii) . Hence, by Theorem [9] (iv) , we obtain 

^ p(tt) = £*>0i) > Pr{sf (both (i) and (ii) hold} > 1 - 2ne, 

ir'wi=i 

where Y2* * s the sum over an words satisfying the condition (d) in The- 
orem O (iii). 

□ 

As a first simple application of Lemma [T2l let us consider a stochastic 
process S = {Si, S 2 , • • • } such that the stochastic variables S{ are indepen- 
dent and identically distributed, namely, each symbol s € A n appears at a 
probability p{s) > independently. If we put < a := mm s& A n {p(s)} < 1, 
then we have 

Pr{s[ L/2l \ Sj + s for any 1 <j < [L/2]} = (1 - p(s)) [L/2J < {(1 -a)2| L . 

Thus, by Lemma [T2l we have 

ff(Sf) -H*(S£) < 2n 2 f(l -a)U L log(L + n) -> 0. 

However, in this case, E*(S) = E(S) is obvious from E*(S) < E(S) because 
E(S) = 0. 

Let S = {Si, S2, • • • } be a finite-state stationary ergodic Markov process 
with a set of states A n and a transition matrix P = (pij), where pij > 
for all 1 < i,j < n and ^j=iP«j = 1 f° r all 1 < i < n. It is known 
that a finite-state stationary Markov process is ergodic if and only if its 
transition matrix P is irreducible |19j : a matrix P is irreducible if for all 
1 < hj < w there exists Z > such that > 0, where pfj is the (z,j)-th 
element of P l . By the Perron- Frobenius theorem for irreducible non- negative 
matrices, there exists a unique stationary distribution p = (pi, • • • ,p n ) such 
that pi > for all 1 < i < n, Y^i=\Pi = 1 an< ^ Y27=iPiPij = Pj f° r au 
1 < i < namely, *Pp = p, where P is the transpose of the matrix 
P. Then, we have p(s^) = p Sl Ps 1 s 2 ' ' 'Ps L _ 1 s L for s\ € A L . The entropy 
rate /i(S) and the excess entropy E(S) of a finite-state stationary Markov 
process S = {Si, S 2 , ■ ■ ■ } are given by /i(S) = — Y17,j=i PiPij Pij and 
E ( s ) = - YJi=iPi lo gK + E"i=i PiPij ^ogpij, respectively. 
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Let L be a positive integer. Let us put N := \_L/2\. Given a symbol 
s G A n , we would like to evaluate 

p 8 := Pr{s^|sj / s for any 1 < j < N} 

= ^2 P( S 1---Pn)= ^ P Sl P Sl S2---Ps N - lSN - 

If n = 1 then 0i = 0. So, this case is trivial. Hence, we assume n > 2 in the 
following discussion. If we introduce a matrix P s whose (i,j)-th elements 
are defined by 



(Ps)ij 

then we can write 



if i = s 
Pij otherwise, 



P, = {(Ps) N - 1 U„p), 

where a vector u s = (« l5 • • • , u n ) is defined by ui = if i = s otherwise Uj = 
1 and (• • • , • • • } is the usual inner product in the n-dimensional Euclidean 
space. 

Since P s is a non-negative matrix, the following statements hold by the 
Perron- Frobenius theorem for non-negative matrices: 

(i) There exists a non-negative eigenvalue A such that any other eigenvalue 
of P s has absolute value not greater than A. 

(ii) A<max i {E^ =1 (P s ) ij } = l. 

(iii) There exists a non-negative right eigenvector v corresponding to the 
eigenvalue A. 

Lemma 13 A < 1. 

Proof. Suppose that A = 1. Then, we have P s v = v. For any positive 
integer I, we have 

(v,p) = (Pjv,p) < (P'v,p) = (v, (*P)'p) = (v,p), 

since P s < P. Thus, we obtain ((P l — Pj) v,p) = 0, which implies that 
(P* — Pj) v = because p is a positive vector and (P l — Pj) v is a non- 
negative vector. Now, let us fix any 1 < j < n. There exists I such that 
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p s j > because P is irreducible. Since the elements in the s-th row of the 
matrix P l s are all 0, we have Ylk=iP^sk Vk = 0> w h ere we P u ^ v = (v%, ■ ■ ■ , v n ). 
Thus, we obtain Vj = because p^ > 0, — an d v k > for all 
1 < k < n. Since 1 < j < n is arbitrary, v = must hold. However, this 
contradicts v/0 because v is an eigenvector. 

□ 

Now, let P s = S + T be a Jordan-Chevalley decomposition of the matrix 
P s , where S is a diagonalizable matrix and T is a nilpotent matrix. Let A be 
an invertible matrix such that A~ 1 SA = D, where D is a diagonal matrix. 
Since T is nilpotent, there exists a positive integer k such that T k is a zero 
matrix. We also have ST = TS. If we put E := A _1 TA then E k is a zero 
matrix and DE = ED. Thus, for sufficiently large N, 

Pf" 1 = A(D + EfA- 1 




= X N ~ k O(N k ~ 1 ), 



where the big-O notation 0(N k ~ r ) for a matrix means that each element 
of the matrix is 0(N k ~ v ). Hence, we obtain [3 S = \ N ~ k O(N k ~ 1 ). Since 
< A < 1 by Lemma (T3J we get the following theorem by combining Lemma 
[TU1 and Lemma [12j 

Theorem 14 Let S = {Si, S2, ■ ■ ■} is a finite-state stationary ergodic Markov 
process. Then, the permutation excess entropy E*(S) exists and E*(S) = 
E(S). 

We can construct a finite-state stationary non-ergodic Markov process 
such that E(S) 7^ E*(S) immediately. For example, let n = 2 and 

-CO- 

We choose a stationary distribution p = (pi,P2) = (35 2)' Then we have 
p(00-_0) = p(n-_l) = \. Hence we have h(S) = h*{S) = and E(S) = 

L L 

—pi logpi — P2 logj>2 = 1- On the other hand, we have E*(S) = because 
<p(00_-_0) = (f>(n-_l) G S L . 

L L 
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